The Hidden Power of BCD Instructions
While programming a bit with x86 ASM, I discovered that AAx instructions are just great. Yes, I'm talking about those weird BCD (Binary Coded Decimal) instructions made for programmers who can't count in hex...
Most if not all Intel processors can count in decimal, storing integers between 0 and 9 in four-bit units, and performing add, sub, mul and div with them. In many books or asm tutorials, those instructions are considered 'inefficient' or 'completely useless'.
Let's take a look at them:
AAA : ASCII Adjust after Addition
AAS : ASCII Adjust after Subtraction
AAM : ASCII Adjust after Multiplication
AAD : ASCII Adjust before Division
Although they are second rank instructions, Intel reserved four opcodes (actually six if you include DAA and DAS) and one flag (auxilliary carry) for them... that deserves some attention.
Unfortunately, none of these instructions is correctly documented by Intel! You may be already aware of that, but I'll explain what those instructions do exactly.
Then, we will look at what can be done with those pretty nice instructions, especially when doing size optimization...
AAA - AAS
These instructions 'ASCII adjust AL after addition/subtraction, using AF to determine whether there was a carry'.
Let me explain what this AF flag is. AF stands for Auxiliary Carry. The AF flag is set when an arithmetic operation generates a carry or a borrow out of bit 3 of the result.
For example 5+2=7: AF is cleared.
However, 9+8=17=11h, which is greater than 0Fh: AF is set.
According to Intel documentation, AAA/AAS are only useful after an ADD/SUB instruction; actually we can use AAA and AAS after any arithmetic instruction (they all modify AF), including INC/DEC and CMP.
So, let's take a look at what AAA does. Here is the pseudo-code equivalent (from Intel):
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL = (AL + 6) AND 0FH;
AH = (AH + 1);
AF = 1;
CF = 1;
ELSE
AF = 0;
CF = 0;
FI;
That's true for a typical Intel documentation compliant use like this one:
mov al,6
add al,9 ;al=15=0Fh
aaa ;al=21=15h => it's in decimal!
But what happens if you write something like:
mov ax,00F6h
add al,9 ;ax=255=00FFh
aaa ;ax=507=0205h
If you stick to Intel documentation, we should have ax=0105h at the end, not 0205h! Well, the Intel pseudo-code is wrong in that case. I think it was the way the 8086 worked, but newer processors use this pseudo-code:
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AX = (AX + 0106h);
AF = 1;
CF = 1;
ELSE
AF = 0;
CF = 0;
FI;
AL = AL AND 0FH;
For AAS, it's exactly the same problem, you just have to replace the '+' with a '-'.
What about the flags? AF and CF are set as described above. OF stays unchanged, SF is cleared.
ZF is set if AX=0, PF is set depending on AL (pretty logical, isn't it?).
AAD - AAM
These are the most interesting instructions and are described by Intel as 'ASCII adjust AX before division/after multiplication'.
They take 'x' as a byte argument. It used to be undocumented, but it's now in Intel's documentation.
AAD : AL = AH*x + AL AH = 0
AAM : AH = AL/x AL = AL mod x
What Intel doesn't say is that all flags (SF, ZF, PF, OF, CF, AF) are set depending on the result, exactly as with any other arithmetic instruction.
For AAM, the flags reflect the result of the 'mod' operation (AL mod x). It means that SF, ZF and PF are set according to AL, and that OF, CF and AF are cleared.
For AAD, the flags reflect the result of the '+' operation (AH*x)+AL.
So, when using 8 bit values, AAD and AAM are good alternatives to MUL and DIV: shorter code and opportunity to use flags.
Used with parameters like 0 or 1, they are often useful short ways to do ADD, CMP and MOV combinations:
+----+-------+-----------------------+
| AH | AL | Equivalence |
+-------+----+-------+-----------------------+
| AAD 0 | 0 | AL | mov ah,0 ; cmp al,0 |
| AAM 0 | | | Division by zero |
| AAD 1 | 0 | AL+AH | add al,ah ; mov ah,0 |
| AAM 1 | AL | 0 | mov ah,al ; and al,0 |
+-------+----+-------+-----------------------+
Simple Examples
So, what can we do with those AAx instructions? Here are some simple examples.
1. Test the value in AL
To test whether AL is 0 or 1, one single-byte instruction is enough:
aaa
jz @zero
2. How to test bounds
If you want to test AL against two values, you would do something like:
cmp al,MIN
jz @bound_reached
cmp al,MAX
jz @bound_reached
Using AAM, it's shorter :
sub al,MIN
aam MAX-MIN
jz @bound_reached
3. Discrete homotethie AL=AL+k[AL/x]
The combination of AAM and AAD is perfect in this case: after those two instructions, AL is increased by K times the integer part of (AL/x).
aam X
aad X+K
It's often a nice way to replace a CMP/Jcc/ADD instruction block.
For example, if you want to change the case of an ASCII character in [0-9][A-Z][a-z], you could write:
aam 40h
aad 60h ;to lower case
aam 60h
aad 40h ;to upper case
Complex Example
Well, if you are still not convinced by the AAx instructions, here is an example that might be very useful.
Imagine we know AL is one of those ten (randomly chosen) values: 01h, 25h, 39h, B3h, 78h, C4h, 6Bh, A6h, 12h, and F0h. We want to test in which set it is: A={25h, 6Bh, 78h, B3h, C4h} or B={01h, 12h, 39h, A6h, F0h}?
Of course, we could compare AL with each value in A:
cmp al,25h
jz @set_A
cmp al,6Bh
jz @set_A
cmp al,78h
jz @set_A
cmp al,0B3h
jz @set_A
cmp al,0C4h
jz @set_A
@set_B: ...
...
@set_A: ...
...
But we can also do this using AAD and AAM. We also need a flag set by the AAD instruction with a probability of about 1/2: both SF and PF are suitable.
So, we just need a dummy little C program to try all 256*256*4 possibilities and find one solution of the following form:
aam X
aad Y
jcc @set_A
with jcc being jp, jnp, js or jns.
This program looks for solutions using the parity flag or the sign flag...
#include <stdio.h>
#define byte unsigned int
#define NR 10
byte al,ah,x,y,k,found;
//Values array
byte set[NR]={0x01,0x25,0x39,0xB3,0x78,0xC4,0x6B,0xA6,0x12,0xF0};
//Results array : 0 for set_A, 1 for set_B
byte res[NR]={1, 0, 1, 0, 0, 0, 0, 1, 1, 1};
//P flag array
byte p[NR] ={0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
//S flag array
byte s[NR] ={0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
main () {
for (x=1; x<256; x++) {
for (y=0; y<256; y++) {
for (k=0; k<NR; k++) {
p[k]=0;
al = set[k] % x;
ah = set[k] / x;
al = (al+ah*y)%256;
ah = 0;
if (al<0x80) s[k]=0; else s[k]=1;
while (al) { p[k]+=al%2; al>>=1; }
p[k]%=2;
}
found=1;
for (k=1; k<NR; k++) {
if ((p[k]^res[k])!=(p[0]^res[0])) found=0;
}
if (found) {
printf ("X=%d Y=%d ",x,y);
if (p[0]^res[0]) printf ("using JNP\n");
else printf ("using JP\n");
}
found=1;
for (k=1; k<NR; k++) {
if ((s[k]^res[k])!=(s[0]^res[0])) found=0;
}
if (found) {
printf ("X=%d Y=%d ",x,y);
if (s[0]^res[0]) printf ("using JNS\n");
else printf ("using JS\n");
}
}
}
}
Here we get 37 solutions; this one for example:
aam 0Dh
aad 11h
jp @set_A
You want to verify ? Here we go:
AH:AL 00:01 00:25 00:39 00:B3 00:78 00:C4 00:6B 00:A6 00:12 00:F0
aam 00:01 02:0B 04:05 0D:0A 09:03 0F:01 08:03 0C:0A 01:05 12:06
aad 00:01 00:2D 00:49 00:E7 00:9C 00:00 00:8B 00:D6 00:16 00:38
PF 1 0 1 0 0 0 0 1 1 1
The weakest point is that it doesn't help making the code clear and easy to understand... or is this a nice feature?
Conclusion
I hope it's now obvious that AAx instructions are powerful. Of course, AAM and AAD are slow, and if you need speed, in most cases you have to forget it. But they are actually very short macro instructions: 2 bytes for a MUL-ADD, that's fantastic!
Credits: http://www.x86.org/secrets/opcodes.htm
For any comments, suggestions or questions, contact me.